62 research outputs found
Workflow-Management im Web
Die Bedeutung des Web als Basis für die Entwicklung von Applikationen wird immer größer. Zahlreiche Vorteile, wie die Verfügbarkeit auf vielen Systemplattformen oder
standardisierte Clients, sprechen für Web-basierte Lösungen. Daher bieten sich Web-Clients auch als Benutzerschnittstelle für Workflow-Management-Systeme (WfMS) an. Welche speziellen Anforderungen WfMS besitzen, und wie diese mit den besonderen Charakteristika von Web-Applikationen vereint weden können, wird im Rahmen dieser Arbeit geklärt. Außerdem werden Lösungsansätze für web-basierte WfMS aufgezeigt und bewertet. Ein Vergleich von Web-Anbindungen in existierenden WfMS, und eine
Diskussion über die Einsatzmöglichkeiten von Web-Clients schließen die Arbeit ab
Detection of phrase boundaries and accents
On a large speech database read by untrained speakers experiments for the recognition of phrase boundaries and phrase accents were performed. We used durational features as well as features derived from pitch and energy contours and pause information. Different sets of features were compared. For distinguishing three different boundary classes a recognition rate of 75.7% and for distinguishing accentuated from unaccentuated syllables a recognition rate of 88.7% could be achieved
Syntactic-prosodic labeling of large spontaneous speech data-bases
In automatic speech understanding, the division of continuously running speech into syntactic chunks is a great problem. Syntactic boundaries are often marked by prosodic means. For the training of statistic models for prosodic boundaries large databases are necessary. For the German Verbmobil project (automatic speech-to-speech translation), we developed a syntactic-prosodic labeling scheme where two main types of boundaries (major syntactic boundaries and syntactically ambiguous boundaries) and some other special boundaries are labeled for a large Verbmobil spontaneous speech corpus. We compare the results of classifiers (multilayer perceptrons and language models) trained on these syntactic-prosodic boundary labels with classifiers trained on perceptual-prosodic and pure syntactic labels. The main advantage of the rough syntactic-prosodic labels presented in this paper is that large amounts of data could be labeled within a short time. Therefore, the classifiers trained with these labels turned out to be superior (recognition rates of up to 96%)
Prosodic processing and its use in Verbmobil
We present the prosody module of the VERBMOBlL speech-to-speech translation system, the world wide first complete system, which successfully uses prosodic information in the linguistic analysis. This is achieved by computing probabilities for clause boundaries, accentuation, and different types of sentence mood for each of the word hypotheses computed by the word recognizer. These probabilities guide the search of the linguistic analysis. Disambiguation is already achieved during the analysis and not by a prosodic verification of different linguistic hypotheses. So far, the most useful prosodic information is provided by clause boundaries. These are detected with a recognition rate of 94%. For the parsing of word hypotheses graphs, the use of clause boundary probabilities yields a speed-up of 92% and a 96% reduction of alternative readings
Dialog act classification with the help of prosody
This paper presents automatic methods for the segmentation and classication of dialog acts (DA). In Verbmobil it is often sufficient to recognize the sequence of DAs occurring during a dialog between the two partners. Since a turn can consist of one or more successive DAs we conduct the classification of DAs in a two step procedure: First each turn has to be segmented into units which correspond to a DA and second the DA categories have to be identified. For the segmentation we use polygrams and multi -layer perceptrons, using prosodic features. The classification of DAs is done with semantic classication trees and polygrams
Automatic classification of prosodically marked phrase boundaries in German
A large corpus has been created automatically and read by speakers. Phrase boundaries were labeled in the sentences automatically during sentence generation. Perception experiments on a subset of 500 utterances showed a high agreement between the automatically generated boundary markers and the ones perceived by listeners. Gaussian distribution and polynomial classifiers were trained on a set of prosodic features computed from the speech signal using the automatically generated boundary markers. Comparing the classification results with the judgments of the listeners yielded in a recognition rate of 87%. A combination with stochastic language models improved the recognition rate to 90%. We found that the pause and the durational features are most important for the classification, but that the influence of F0 is not neglectable
Prosodic modules for speech recognition and understanding in VERBMOBIL
Within VERBMOBIL, a large project on spoken language research in Germany, two modules for detecting and recognizing prosodic events have been developed. One module operates on speech signal parameters and the word hypothesis graph, whereas the other module, designed for a novel, highly interactive architecture, only uses speech signal parameters as its input. Phrase boundaries, sentence modality, and accents are detected. The recognition rates in spontaneous dialogs are for accents up to 82,5%, for phrase boundaries up to 91,7%
Prosody takes over : towards a prosodically guided dialog system
The domain of the speech recognition and dialog system EVAR is train time table inquiry. We observed that in real human-human dialogs when the officer transmits the information, the customer very often interrupts. Many of these interruptions are just repetitions of the time of day given by the officer. The functional role of these interruptions is often determined by prosodic cues only. An important result of experiments where naive persons used the EVAR system is that it is hard to follow the train connection given via speech synthesis. In this case it is even more important than in human-human dialogs that the user has the opportunity to interact during the answer phase. Therefore we extended the dialog module to allow the user to repeat the time of day and we added a prosody module guiding the continuation of the dialog by analyzing the intonation contour of this utterance.Der Diskursbereich des Spracherkennungs- und Dialogsystems EVAR ist Fahrplanauskunft für Züge. Wir beobachteten, dass in realen Mensch-Mensch Dialogen der Kunde sehr oft den Auskunftsbeamten unterbricht, wenn dieser die Information übermittelt. Viele dieser Unterbrechungen sind ausschließlich Wiederholungen der Uhrzeitangabe des Beamten. Die funktionale Rolle dieser Unterbrechungen wird häufig alleine durch prosodische Mittel bestimmt. Ein wichtiges Ergebnis von Dialog Experimenten mit naiven Personen ergab, dass es schwer ist, den Verbindungsauskünften von EVAR via Sprachsynthese zu folgen. In diesem Fall ist es sogar noch wichtiger als in Mensch-Mensch Dialogen, dass der Benutzer die Möglichkeit hat, während der Antwortphase zu interagieren. Deshalb haben wir das Dialogmodul erweitert, um dem Benutzer die Möglichkeit zu geben, die Uhrzeitangaben zu wiederholen, und wir fügten ein Prosodiemodul hinzu, das die Fortführung des Dialogs steuert, indem die Intonation dieser Äußerung analysiert wir
Prosodic scoring of word hypotheses graphs
Prosodic boundary detection is important to disambiguate parsing, especially in spontaneous speech, where elliptic sentences occur frequently. Word graphs are an efficient interface between word recognition and parser. Prosodic classification of word chains has been published earlier. The adjustments necessary for applying these classification techniques to word graphs are discussed in this paper. When classifying a word hypothesis a set of context words has to be determined appropriately. A method has been developed to use stochastic language models for prosodic classification. This as well has been adopted for the use on word graphs. We also improved the set of acoustic-prosodic features with which the recognition errors were reduced by about 60% on the read speech we were working on previously, now achieving 10% error rate for 3 boundary classes and 3% for 2 accent classes. Moving to spontaneous speech the recognition error increases significantly (e.g. 16% for a 2-class boundary task). We show that even on word graphs the combination of language models which model a larger context with acoustic-prosodic classifiers reduces the recognition error by up to 50 %
- …